Background
Genome-wide analyses, especially gene expression profiling using microarrays, have been extensively used in medical research and led to the identification of several molecular signatures involved in various aspects of human disease pathogenesis. Individual studies have typically investigated relatively small numbers of samples, making cross-study validation a crucial step for the scientific community. Combined use of gene expression data from public repositories has proved difficult due to inherent differences in microarray platforms, protocols used in independent laboratories, experimental designs, and annotations for both genes and samples. Several methodologies have been proposed to address these issues, depending on the experimental strategies and on the biological and clinical questions. When samples phenotypes are known, statistical methods that handle data sets separately and then apply gene-wise meta-analytic approaches have proven successful, allowing the identification of statistically relevant intersections of molecular signatures from different studies (Rhodes et al., 2002; Ghosh et al., 2003; Rhodes et al., 2004; Wang et al., 2004). Advanced multilevel models are now available for this task (Conlon et al., 2007; Scharpf et al., 2009). As an alternative, the assimilation of gene expression measurements, achieved by merging the data sets, has also been used to evaluate molecular signatures obtained from different studies (Sorlie et al., 2003; Hu et al., 2005; Kapp et al., 2006; Hayes et al., 2006). Finally, we previously developed a method to evaluate cross-platform consistency of expression patterns, using integrative correlation (ICOR).